Dgfs-cl Comparing Computational Models of Selectional Preferences – Second-order Co-occurrence vs. Latent Semantic Clusters
نویسندگان
چکیده
Selectional preferences (i.e., semantic restrictions on the realisation of predicate complements) are of great interest to research in Computational Linguistics, both from a lexicographic and from an applied (wrt data sparseness) perspective. This poster presents a comparison of three computational approaches to selectional preferences: (i) an intuitive distributional approach that uses second-order co-occurrence of predicates and complement properties; (ii) an EM-based clustering approach that models the strengths of predicate–noun relationships by latent semantic clusters (Rooth et al., 1999); and (iii) an extension of the latent semantic clusters by incorporating the MDL principle into the EM training, thus explicitly modelling the predicate–noun selectional preferences by WordNet classes (Schulte im Walde et al., 2008). The motivation of our work was driven by two main question: Concerning the distributional approach, we were interested not only in how well the model describes selectional preferences, but moreover which second-order properties were most salient. For example, a typical direct object of the verb drink is usually fluid, might be hot or cold, can be bought, might be bottled, etc. So are adjectives that modify nouns, or verbs that subcategorise nouns salient second-order properties to describe the selectional preferences of direct objects? Our second interest was in the actual comparison of the models: How does a very simple distributional model compare to much more complex approaches, especially with respect to model (iii) that explicitly incorporates selectional preferences?
منابع مشابه
Comparing Computational Models of Selectional Preferences - Second-order Co-Occurrence vs. Latent Semantic Clusters
This paper presents a comparison of three computational approaches to selectional preferences: (i) an intuitive distributional approach that uses second-order co-occurrence of predicates and complement properties; (ii) an EM-based clustering approach that models the strengths of predicate–noun relationships by latent semantic clusters; and (iii) an extension of the latent semantic clusters by i...
متن کاملComputational Models for Chinese Selectional Preferences Induction
Selectional preference (SP) is an important kind of semantic knowledge. It can be used in various natural language processing tasks, including metaphor computing, lexicon building, syntactic structure disambiguation, word sense disambiguation, semantic role labeling, anaphora resolution, etc. This paper presents and compares two computational models for Chinese SP induction, a HowNet-based Sele...
متن کاملProbabilistic Distributional Semantics with Latent Variable Models
We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using the acquired representations to model the effects of context on word meaning. Our framework uses Bayesian latent-variable models inspired by, and extending, the well-known Latent Dirichlet Allocation (LDA) model of topical structure in documents; when applied to predicate–argument ...
متن کاملThe Impact of Selectional Preference Agreement on Semantic Relational Similarity
Relational similarity is essential to analogical reasoning. Automatically determining the degree to which a pair of words belongs to a semantic relation (relational similarity) is greatly improved by considering the selectional preferences of the relation. To determine selectional preferences, we induced semantic classes through a Latent Dirichlet Allocation (LDA) method that operates on depend...
متن کاملLatent Semantic Clustering of German Verbs with Treebank Data
Treebank data have been utilized as data sources for a wide range of tasks in computational linguistics, including statistical parsing, anaphora resolution, induction of valence lexica, etc. More recently, researchers have experimented with extracting semantic information from syntactically annotated data. Here, treebank data have been used for the purposes of identifying selectional preference...
متن کامل